CBT Campus' Online Skills Training Courses.

IT Skills

Enterprise Database Systems

Data Science

Introduction to Statistical Concepts

it_daitscdj_01_enus

it_daitscdj_02_enus

it_feemds_02_enus

Core Statistical Concepts: An Overview of Statistics & Sampling

Course Number:
it_daitscdj_01_enus

Expected Duration (hours)
0.8

Lesson Objectives

Core Statistical Concepts: An Overview of Statistics & Sampling

discover the key concepts covered in this course
describe what statistics, populations, and samples are
recognize how metrics such as mean, median and mode describe data
recall what information is conveyed by measures such as standard deviation and variance
summarize the workings a number of probability sampling techniques
outline how to create balanced samples from an imbalanced dataset
summarize the key concepts covered in this course

Overview/Description
With data now being one of the most valuable assets to tap into, the demand for data science skills increases by the day. Statistics and sampling are at the core of data science. Use this course as a theoretical introduction to using samples to reveal various statistics. Examine what exactly is meant by statistics and samples. Explore descriptive statistics, namely measures of central tendency and of dispersion. Study probability sampling techniques, including simple random sampling and cluster sampling. Investigate how undersampling and oversampling are used to generate more balanced datasets. Upon completion, you'll know the best way to use statistics and samples for your specific goals and needs.

Target

Prerequisites: none

Core Statistical Concepts: Statistics & Sampling with Python

Course Number:
it_daitscdj_02_enus

Expected Duration (hours)
1.7

Lesson Objectives

Core Statistical Concepts: Statistics & Sampling with Python

discover the key concepts covered in this course
install the latest versions of pandas and visualization modules used to analyze data
load data from a CSV file into a pandas DataFrame and perform some initial analysis
calculate the mean and median of a distribution using your own function and compare it with the built-in pandas function
use Seaborn and Matplotlib to visualize a distribution and where the mean, median, and mode fit in
compute and visualize the standard deviation and variance of a distribution
implement simple random and stratified sampling on a data frame
use pandas to generate a sample using cluster and systematic sampling
create a balanced sample using random undersampling and oversampling
generate synthetic data in order to create a balanced sample using the Synthetic Minority Over-sampling Technique (SMOTE)
summarize the key concepts covered in this course

Overview/Description
Data is one of the most valuable assets a business has, but it's only as valuable as the methods used to interpret it. Data science, which at its core includes statistics and sampling, is the key to data interpretation. In this course, practice using the pandas library in Python to work with statistics and sampling. Practice loading data from a CSV file into a pandas DataFrame. Compute a variety of statistics on data. While doing so, see how to visualize the relationship between data and computed statistics. Moving along, implement several sampling techniques, such as stratified sampling and cluster sampling. Then, explore how a balanced sample can be created from an imbalanced dataset using the imblearn module in Python. Upon completion, you'll be able to generate samples and compute statistics using various tools and methods.

Target

Prerequisites: none

Final Exam: Statistics and Probability

Course Number:
it_feemds_02_enus

Expected Duration (hours)
0.0

Lesson Objectives

Final Exam: Statistics and Probability

analyze and visualize data using box plots
analyze a uniform distribution by using cumulative distribution and probability density functions
apply Poisson distributions to make estimates in real-life situations
calculate and visualize confidence intervals using Python
calculate joint probabilities associated with the rolling of a die
calculate the joint probability of dependent variables
calculate the mean and median of a distribution using your own function and compare it with the built-in pandas function
calculate the mean and median of a distribution using your own function and compare it with the built-in pandas function
compare and contrast type I and type II errors in hypothesis testing
compute conditional probabilities
create a balanced sample using random undersampling and oversampling
create a function to manually perform a T-test
create naive Bayes models in Python
define a Bayesian model in Python
define and understand the Bayes theorem
define descriptive and inferential statistics
define joint, marginal, and conditional probability
define terms such as event, outcome, and experiment
define the formula of the expected value of a random variable
describe binomial distributions and generate one using SciPy
describe different types of probability distributions and where they occur
describe normal distributions and their characteristics
describe the fundamentals of hypothesis testing
describe type I and type II errors
describe what statistics, populations, and samples are
estimate a population's mean with confidence intervals
explain the law of large numbers programmatically
explore one-sided and two-sided T-tests
explore probabilities associated with a Bayesian model
explore the probability tables of nodes in a Bayesian network
import python libraries needed to work with probabilities
interpret p-values using alpha levels
load data from a CSV file into a pandas DataFrame and perform some initial analysis
outline the use of one-way ANOVA analysis
outline the use of the two-way ANOVA analysis
perform the paired T-test on paired samples
perform the Wilcoxon signed-rank test to compare medians
perform T-tests on real-world data
predict values with Bayesian models
recall the assumptions of the two-sample T-test
recall the symmetrical features of normal distributions
recognize how data is distributed using histograms and violin plots
recognize how metrics such as mean, median and mode describe data
recognize the use of the Mann-Whitney U-test
recognize when the Welch’s T-test should be used
recount binomial distributions and generate one using SciPy
set up null and alternative hypotheses for statistical tests
simulate the flipping of a coin in Python
simulate the rolling of two die to test joint probability
summarize the workings a number of probability sampling techniques
test medians using the Wilcoxon signed-rank test
use Levene’s test to check for equal variances
use Poisson distributions to make estimates in real-life situations
use Seaborn and Matplotlib to visualize a distribution and where the mean, median, and mode fit in
use the Mann-Whitney U-test
use the non-parametric Kruskal-Wallis test
use the two-sample T-test to compare means
use the Welch’s T-test to compare means
use Tukey’s HSD to know which categories differ significantly
use two-way ANOVA with interaction between the independent variables

Overview/Description

Final Exam: Statistics and Probability will test your knowledge and application of the topics presented throughout the Statistics and Probability track of the Skillsoft Aspire Essential Math for Data Science Journey.

Target

Prerequisites: none